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Eead February 28, 1907.) 

1. The systems of notation hitherto used by writers on the theory of 
correlation are somewhat unsatisfactory when many variables are involved. 
In the present paper a new notation is proposed which is simple, definite, and 
quite general, tbus very greatly facilitating the treatment of the subject. The 
majority of the results given in the sequel were, in fact, first suggested by 
the notation itself. 

2. Let Xix 2 ...x n denote deviations in the values of the n variables from 
their respective arithmetic means. Then the regression equation may be 
written :— 

%1 = &12-34...»^2+ &13-24.. .n x Z+ • • • + &lw23...?i-l^»- 0) 

In this notation the suffix of each regression coefficient completely defines 
it. The first subscript gives the dependent variable, the second the variable 
of which the given regression is the coefficient, and the subscripts after the 
period show the remaining independent variables which enter into the 
equation. It is convenient to distinguish the subscripts before and after the 
period as " primary " and " secondary " subscripts respectively. The order in 
which the secondary subscripts are arranged is indifferent, but the order of the 
two primary subscripts is material ; e.g., &i 2 .3... n and & 2 i.3...« denote two quite 
distinct coefficients. A coefficient with jp secondary subscripts may be termed 
a regression of the^th order, the total regressions &12, biz, b 2 s, etc., being thus 
regarded as of order zero. 

3. The correlation-coefficients may be distinguished by subscripts in 
precisely the same manner. Thus the correlation r12.34.-w is defined by the 
relation 

ri2-te...n = (&12-34...W • & 2 l-34...w) *. (2) 

In the case of the correlations, the order of both primary and secondary 
subscripts is indifferent. A correlation with p secondary subscripts may be 
termed a correlation of order p, the total correlations ri 2 , ri& r 23 , etc., being 
regarded as of order zero. 

4. If the regressions in equation (1) be determined as usual by the method 
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of least squares, the difference between X\ and the expression on the right, for 
any observed set of values of XiX 2 ...x n , may be denoted by Xi. 2 s... n • that is 

#1.23...» =z x \ — ^12-34... n x 2"— ^13-24... »#3"— • • • — &l»-23...»-l#». (3) 

Such a residual, or deviation, denoted by a symbol with p secondary 
subscripts may be termed a deviation of the pth order, xix 2 ...x n being regarded 
as deviations of order zero. 

5. Finally, the standard deviation ci. 2 3... w is defined as given by the relation 

N" • 0" 2 l-23...n = 2 (x 2 l-2B...n), (4) 

N being the number of observations. If the standard deviation be denoted 
by a symbol with p secondary subscripts, it is of the pth. order, the total 
standard deviations being regarded as of order zero. 

6. In terms of this notation, the normal equations from which the regressions 
are determined may be very briefly written, in the form 

2 (X 2 . #1.23...*) = 2 (X B . #l.23...n) = . • • 

= 2 (X n . #i.23...i0 = 0. (5) 

That is to say, we have the general theorem : " The product-sum of any 
deviation of order zero with any deviation of higher order is zero, provided 
the subscript of the former occur amongst the secondary subscripts of the 
latter/' 

7. It follows that the product-sum of any two deviations of the same order, 
with the same secondary suffixes, is unaltered by omitting any or all of the 
secondary subscripts of either and, conversely, the product-sum of any 
deviation of order p with a deviation of order p -f q t the p subscripts being the 
•same in each case, is unaltered by adding to the secondary subscripts of the 
former any or all of the q additional subscripts of the latter, for we have 
by § 6 :— 

2 (x 1 .M...n%2.M...n) = 2 (#1.34...*) (#2 — hz.4,..n%B— . . . ~ hn>3...n-l%n) 

= 2 (^i.34...»#2). 

Similarly 

2 («^1.34...n#2.34...n-l) = 2 (#1.34...rc#2)> 

and so on. Therefore, quite generally, 

2 (#1.34...n#2.34...n) = 2(#1.34...n^2.34...w-l) = ... 

... = 2 (aJi.34...ii%). (6) 

8. It follows from § 7 as a corollary from § 6 that the product-sum of any 
two deviations is zero if all the subscripts of the one are contained among the 
■secondary subscripts of the other. 

These theorems (§§ 6 — 8) give the key to simple deductions of many 
results in the theory of multiple correlation. 
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9. We have from the last section and § 7, 

= 2 (#2.34...n#l-234...n) 

= 2 (#2.34...n) (^i — Si2.34...^2— terms in x z to ^) 

= 2 (#1^2. 34... n)— Si2.34...wS (^2.^2- 34... n) 
= 2 (^1.34...n^2.34...?i) — ^12-34.. .»2 (^ 2 2-34...rc)- 

That is 

X, 2(#1.34...n#2.34...n) /hr\ 

#12.34... ra v , o x • V / 

2(^ 2 2.34...n) 

But this is the value that would have been obtained by taking a regression, 
equation of the form 

^l-34— » = ^12.34" .?i^2-34...nj 

and determining 612.34... n by the method of least squares. That is to say,. 
&i2.34...» may be regarded, quite generally and without any reference to the 
form of the frequency distribution, as the regression of #i.34... w on #2.34».». It' 
follows at once from the definition (3) that r12.34.-n may be regarded as the 
correlation between xi.u... n and x 2 .^.. n , and from (4) that we may write 

&12.34...n = ?*12.34...n — ^ • (8)' 

All the relations, in fact, that hold good between deviation-sums, standard 
deviations, regressions and correlations of order zero, are also valid between* 
deviation-sums, standard deviations, regressions and correlations of any high 
order. 

10. This result is of some importance as regards the interpretation of partial 
correlations and regressions. In the case of normal correlation there is no 
difficulty in assigning a meaning to these constants, as the regression is. 
strictly linear, and the partial correlations and regressions are the same for all 
types of the variables. But in the general case this is not so, and although I 
showed, in a previous discussion of the question,* that the values assigned to 
the partial regressions on the assumption of normal correlation are the " least 
square" values and, consequently, that the partial correlation retains an 
" average significance," I could not prove that it remains an actual correlation 
between determinate variables. The above theorem completes the work in this- 
respect. If, with three variables Xi, x 2y and #3, for example, the two regressions. 
&13 and &23 be determined in the ordinary way, and then the residuals 
X1.3 = x\ — bisXs, X2.S = x 2 —-b 2 sX3 be calculated for all sets of observations- 
x\x' 2 x\, x'\x"2x"z, etc., the correlation between #1.3 and x 2 ,s is r 12 .s. A 
similar interpretation holds for any greater number of variables. 

* < Eoy. Soc. Proc.,' vol. 60 (1897), p. 477 ; < Roy. Stat. Soc. Journ.,' vol. 60 (1897),, 
p. 812. 
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Such a relation would not, of course, afford a practical method of calculating 
the partial coefficients, as the arithmetic would be extremely lengthy. 

11. Any standard deviation of order p may be expressed in terms of a 
standard deviation of order p — 1 and a correlation of order p — 1. For we 
have, using the theorems of §§ 6 and 7, 

2 (^ 2 1.23...») = S (^1.23...»-l^l-23...n) 

= S(^i.23...w-i)(^i—^iw.23...»--i^— terms in x 2 to x n - x ) 

= 2(# 2 l.23...»-l) — &l»-23,..»-lS(^1.23...n-l^»-23...»~l) J 

or, dividing through by the number of observations, 

CT 2 1.23.»» = <T 2 l-23...n-l (1 ~" &ln-23— w-l^nl-23—w-l) 

= 0- 2 1.23—n-l (1— ^ 2 l»-23—»-l)- (9) 

The form of this relation is the same as that of the familiar relation 
between a standard deviation of the first order and a standard deviation of 
order zero, with the secondary subscripts 23...»_i added throughout. It is clear 
from (9) that ri n . 2 %.., n _i cannot be numerically greater than unity. It also 
follows at once that if we have been estimating %\ from x 2i x$ . . . x n _i, x n will 
not increase the accuracy of estimate unless ri n . 2 3... w _i (not r iw ) differ from 
zero.* 

12. In equation (9) the subscript n is eliminated from the suffix of ai. 2 ^.. n> 
and it is obvious that any other subscript can be eliminated in the same way. 
Therefore we must have 

0- 2 i.34... n (l— r 2 i 2 .34...») = CT 2 1.24...n(l— ^ 2 13.24-.») 

= ... = (7 2 i.23...»-l(l— ^ 2 ln.23..»-l). (10) 

Further, we have 

& 1.23-w-l = CT 2 1.23— w-2(l— T 2 i( n -i) . 23— n-2) 
<T 1.23..-W-2 = 0r 2 1.23...ra-3 (1 — ^\{n-2) • 2Z-»n-~Z), 

and so on ; so that 

<T 2 1.23...n = <7 2 1 (1 — ^ 2 12) (l—^ 2 ]3.2) (1 — ^ 2 14.23). . .(1 ~ 1*\n-ZZ~.n-l). (H) 

This is an extremely convenient expression for arithmetical use, as 
illustrated later. A complete check on the arithmetic is obtained by 
-eliminating the secondary subscripts in a different, say the inverse, order, i.e., 
by using the result — 

<T 2 1.23...tt = <j\ (1 — T 2 ln ) (1 — r 2 !^-!) . n ) (1 — f\ {n - 2) . n (n-1)). • .(1 — ^ 2 12.34...n). (12) 

* Cf. proofs for cases of 3 and 4 variables previously given {loo. cit. in previous note). 
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13. Any regression of order p may be expressed in terms of regressions 
of order p — 1. For we have 

S'(#/1.34—n#2«34—n) = S V^l-34— n-1^2-34— n) 

= S(^i.34...»-i) (^2— 6 2n . 3 4... n _ia? n -- terms in x% to av-i) 

= S (^1.34—»-1^2.34—»-l)— &2».34—n-lS (^1.34—n-l^»«34'«»-l)« 

That is, replacing &2».34...n-i by &»2.34...n-i x 2 2.M...n-l/o 2 n>M~n-h 

^12-34—»0" 2.34—tt = 012-34— n-10" 2.34.»n-l — 01n-34— n-l°n2»34—n--l ' 2-34— n-l« 

Therefore, by equation (9), 

x 012.34—w-l — &ln«34»-n— 1 • 0n2.34.~n-l /I o\ 

012-34— n = :j 7 T — • \ LO ) 

J- — #lw34-"W-l#wl-34—w-l 

But this is simply the expression for &i 2 .„ in terms of 0i 2 , &i„, &»i, and b n2 , with 
the subscripts S4... n -i added throughout. Therefore 012.34-n ma y he regarded 
as the partial regression of #i.34...»-i on xb-M^n-u #n-34».n-i being given. As 
any other secondary subscript might have been eliminated in lieu of n, we 
can also regard it as the partial regression of #i.45...», on #2.45-n, a?3.45.-» being 
given, and so on. 

14. Equation (13) may be written in terms of the correlations : — 

•l _ ^12.34—w-l — ^l».34...»-1^2»«34...»-l 0"l-34— n-1 

^12.34—n — ^ o ' 

J- — r ln-34» -n- 1 0"2.34—»- 1 

Hence, writing down the similar expression for 021.34.-n, an( l taking the 
square root of the product, 

7*12-34— n-l — ^l».34".n-1^2»'34-.»-l C\A\ 

(1 — ^ln.34...n-lF (1 — ^ 2 2«.34...n-l)* 

This is, similarly, the expression for ri2.» in terms of r i2 , r iw , and r 2n , with 
the secondary subscripts 34... n -i added throughout, and accordingly ri2-34-n 
may be regarded as the partial correlation between ^1.34..^- 1 and #2.34— n-i> 
aV34-n-i being given, and so on, as for the regression. 

15. It is clear that equations (13) and (14) imply a series of relations 
between correlations or regressions of orders less than n — 2 with n variables, 
for all the expressions obtained by eliminating 34... n in turn from the 
secondary subscripts of the constant on the left must be equal to each other. 
Further, every coefficient of the pth order can be expressed in terms of the 
coefficients of the (p — l)th order in p different ways, by eliminating each of 
the p secondary subscripts in turn. This enables an absolute check to be 
kept on the arithmetic by calculating each coefficient in at least two distinct 
ways. 

16. By the use of equation (14), the work of calculating correlation 
coefficients of higher orders is rendered quite simple and straightforward. 
The use of equation (13) for calculating the regressions is comparatively 
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clumsy, however : when the correlations have been found, it is best to work 
out the standard deviations by equation (11), and then the regressions are 
given at once by (8). The following data, taken from a discussion of 
pauperism,* will serve as an arithmetical illustration, the variables being 
the percentage changes during a decade in the poor-law T unions of England in : 
(1) the percentage of the population in receipt of poor-law relief ; (2) the 
ratio of the numbers given relief out-doors to one indoors (in the workhouse); 
(3) the proportion of aged (over 60) in the population ; (4) the population 
itself. The values of the correlations of order zero are given in Table I, and 
the logarithms of (1 — r 2 )*, required in the calculations, are entered in the 
third column. These coefficients are next grouped in sets of three, one set to 
each possible group of three variables, as in the second column of Table II, 
and the coefficients of the first order are then calculated from (14). For 
convenience in calculating the coefficients of the second order, the values of 
log (1 — r 3 )* are again entered in the last column. 

Table I. 



Correlation 
coefficient. 


log Vl — r 2 . 


12 
13 

14 
23 

24 
34 


+ 0-52 
+ 0-41 
-0-14 
+ 0-49 
+ 0*23 
+ 0-25 


1 -93154 
1 -96003 
1 -99570 
1 -94038 
1 -98820 
1 -98598 



Table II. 



Correlation 


Product 




Correlation 




coefficient 


term of 


Numerator. 


coefficient 


log Vl — r 2 . 


(zero order). 


numerator. 




(first order). 




12 


+ 0-52 


+ 0-2009 


+ 0-3191 


12-3 


+ -4014 


1 -92370 


13 


+ 0-41 


+ 0-2548 


+ 0-1552 


13-2 


+ 0-2084 


1 -98071 


23 


+ 0-49 


+ 0-2132 


+ -2768 


23-1 


+ 0-3553 


1 -94139 


12 


+ 0-52 


-0-0322 


+ -5522 


12-4 


+ 0-5731 


1 -82709 


14 


-0-14 


+ -1196 


-0-2596 


14-2 


-0-3123 


1 -95544 


24 


+ 0-23 


-0*0728 


+ 0-3028 


24-1 


+ -3580 


1 -94044 


13 


+ 0*41 


-0-0350 


+ 0-4450 


13*4 


+ 0-4642 


1 -89460 


14 


-0-14 


+ 0-1025 


-0*2425 


14-3 


-0-2746 


1 '96595 


34 


+ 0-25 


-0-0574 


+ 0-3074 


34-1 


+ -3404 


1 -94652 


23 


+ 0-49 


+ -0575 


+ 0*4325 


23-4 


+ 0-4590 


1 '89726 


24 


+ 0-23 


+ -1225 


+ 0-1075 


24-3 


+ 0-1274 


1 -99290 


34 


+ 0*25 


+ 0-1127 


+ 0-1373 


34-2 


+ 0-1618 


1 -98848 



* ' Eoy. Stat. Soc. Journ.,' vol. 62 (1899), p. 249. 
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Table III. 



Correlation 


Product 




Correlation 




coefficient 


term of 


Numerator. 


coefficient 


log -v/1 — r 2 . 


(first order). 


numerator. 




(second order). 




12-4 


+ '5731 


+ 0-2131 


+ 0-3600 


12-34 


+ 0-458 


1 -89774 


13-4 


+ 0-4642 


+ 0-2630 


+ 0-2012 


13-24 


+ 0-276 


1 -96559 


23*4 


+ 0-4590 


+ -2660 


+ 0-1930 


23-14 


+ 0-266 


1 -96814 


12-3 


+ 0-4014 


-0 -0350 


+ 0-4364 


12*34 


+ 0-458 


,,,. 


14-3 


-0 -2746 


+ -0511 


-0-3257 


14-23 


-0-359 


1 -94007 


24-3 


+ 0-1274 


-9-1102 


+ 0-2376 


24*13 


+ 0*270 


1 -96713 


13-2 


+ 0-2084 


-0*0505 


+ 0*2589 


13-24 


+ 0-276 


_ 


14-2 


-0-3123 


+ 0-0337 


-0 -3460 


14-23 


-0-359 





34-2 


+ 0-1618 


-0-0651 


+ -2269 


34*12 


+ 0*244 


1 -97333 


23*1 


+ 0*3553 


+0 -1219 


+ 0*2334 


23-14 


+ 0-266 


i . 


24-1 


+ 0-3580 


+ 0-1209 


+ 0-2371 


24-13 


+ 0-270 





34-1 


+ 0-3404 


+ 0-1272 


+ 0-2132 


34-12 


+ 0-244 


■ 



The first order coefficients, from Table II, are then regrouped according to 
the same primary subscripts as in Table I, and the work repeated precisely as 
before, as in Table III, but each coefficient of the second order is automatically 
calculated by this process in two ways and the work thus checked. Small 
errors introduced by the non-retention of insignificant figures may, of course, 
prevent complete agreement to the last place of decimals, and for this reason 
the coefficients of the first order were evaluated to four figures, although 
only three were required for the final result. In order to obtain the regression 
equation between changes in pauperism and changes in the three remaining 
variables, we require the three regressions £12.34, 613.24, and h u . 2 z and, 
accordingly, must obtain the six standard deviations, 0-1.34, 0-2.34, 0-1.24, 0-3.24, 

0"l-23, 0*4- 23- 

These are readily calculated and checked by means of the equations of the 

form — 

0-1.34 = 0-1 (1 —^13)* (1 — r 2 i4. 3 )* 

= 01 (1 — r 2 u )% (1 — r 2 i 3 .4^ 

given 0-1 = 29*2, cr 2 = 41*7, 0-3 = 5*5, o- 4 = 23*8 ; and the values found are : — 

0-1.34 = 25*61, 0-1.24 = 27*63, 

0-2.34 = 36*06, 0-3.24 = 4*73, 

« 

Hence, from the equations of the form 



0-L23 = 24-39, 
0-4.23 =: 22*86. 



>12-34 = ^12-34 



Q"l-34 
CT2-34 



we have 



'12-34 



= +0-325, &i3.24= +1*383, h 



14-23 



0-383. 
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That is, the regression equation between changes in pauperism and changes 
in the other factors considered is 

xi = 0-325^2+ l-383a? 8 -0-383a?4. 

To complete the work, we may calculate 0-1.334, the standard error made in 
estimating x\ from x 2 , x$ } and x± by the above equation. The value is 

G-i-234 = o-i (1— r\ 2 f(l— r 2 i 3 . 2 )Kl — A4.23) 1 

= cri (1 -r 2 u f (1 — r\ z .$ (1 -rW)* 

= 22-8. 

17. If, in accordance with the notation used for elementary cases in the 
paper already referred to, and that in a recent note by Mr. E. H. Hooker and 
myself,* we write 

2 l.23».» = 2 i(l— E 2 i(23...»)), (15) 

Ri(23-..») may be regarded,, as a coefficient of correlation between x\ and the 
expression 

01-23— n = ^12.34—n^2 + &13.24.../i^3 + • • • + &ln-23— »-l#V (16) 

The value of E is accordingly a useful datum, as indicating how closely x\ 
can be expressed in terms of a linear function of x 2 x% ... #». • It may be readily 
calculated either direct from the equation 

1 — E 2 i(23...») = (1 — ^12) (1 — r 2 i 3 . 2 ). . .(1 — r 2 i w .2 3 ...«-i), (17) 

or from the value of 01.23...71 and 0-1, if previously obtained. 

It is obvious from (17) that, since every bracket on the right is not greater 
than unity, 

1 — E 2 i (28...») ^ 1 —T 2 i2. 

Hence Ei(23...») cannot be numerically less than ri 2 . For the same reason, 
rewriting (17) in every possible form, Ei( 2 3... w ) cannot be numerically less than 
7*12, ?*i3, ...^ln, i-6-, any one of the possible constituent coefficients of order 
zero. Further, for similar reasons, Ei( 2 3... n ) cannot be numerically less than 
any possible constituent coefficient of any higher order. That is to say, 
Ei(23». w ) is not less than the greatest of all the possible constituent coefficients 
of all orders, and is usually, though not always, markedly greater. Thus in 
the illustration of § 16, the value of Ei.( 2 34) is 0*626, and the greatest correlation 
coefficient is r 12.34 = 0*458. The sign of E is necessarily positive, for a 
positive increment in x\ obviously corresponds on the average to a positive 
increment in £1.23...^ More definitely, the standard deviation of 0i. 2 3... n is 
o-iEi(23... n ), and the regression of x\ on e\. 2 %... n is therefore -f 1. 

Seeing that <7 2 i.23.-n = o- 2 i(l — E 2 i( 2 3».n)), and that oi.23...» is a minimum, we 
may, alternatively, regard the values of the regressions as determined by the 

* 'Koy. Stat. Soc. Journal,' vol. 59 (1906), p. 197. 
VOL. LXXIX. — A. 
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condition that the correlation between x 1 and 01.23...*, viz., Ki< 2 3...»), shall be a 
maximum. 

18. It is obvious that equations (13) and (14) imply relations of an inverse 
kind, expressing coefficients of a lower order in terms of those of a higher 
order. Using the same method of expansion as in previous eases, we have 

= 2 (#>L.23...Ji^2.34...?i-l) 

= 2 (#4^2.34».?i-l) — &12.34».w2 (#2#2»34.»»-l) 

""""* "lw23"»M— lA \*^n»^2*34'».»~l/» 

That is 
But by interchanging the suffixes, viz., 1 for % and % for 1, 

frw2-34-»rc-l = ^2.13.»w-i + ^nl.23.-.?i~1^12.34...n~l. 

Substituting, this value of <W4...n~-i in the first equation and simplifying, 

7j ^12.34«.» + &ln'23—n-l^»2»13—n~l /in\ 

^12.34—n-l — t- 7 7- . (lb) 

J- — #ln.23—rc-l#wl.23."»-l 

This is the required equation for the regressions. The similar equation for 
the correlations is obtained at once by writing down the corresponding 
expression for &2i.34»-»-i and taking the square root 

~ * _ ^12*34.«.» + ^lw-23...?z~1^2n.l3-»?i-l /i q\ 

' 12.34.»?i-l — 7= -j s-fTl ^2 ~T| * ^ ' 

(±—7 l w .23».n-lj 2 (1— * 2n.l3».»-l) a 

19. The general principle that any equation subsisting between such 
statistical constants as correlations, regressions, and standard deviations 
holds good for all secondary subscripts, applies also to the equation (3), which 
expresses the. individual deviation of order p in terms of deviations of order 
zero. That is to say, we have, quite generally, k being any subscript or 
collection of subscripts, 

For, if I be any one of the subscripts included under k 9 and if m denote the 
remaining subscripts, on expanding both sides of the equation in terms of 
deviations of order zero, the coefficients of x h x 2 , . . .x n are the same. The 
coefficients of xi are equal if 

But, replacing the regressions bn. m9 &«.«, >..6„z. m by product sums, this 
reduces to 
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which is true by § 8, whether m denote a single subscript or an aggregate, or 
is absent, and equation (20) is accordingly correct. Eemembering that 

#12.3»«&» — ^12.3* -An J — ^ 12-3—An — > 

0"2.3—A» CT2.13— £» 

the equation may also be written in the useful form — 

<£\'2—kn %1'k „, %2'k n „ <%n*k /01 \ 

— ?12.3-&n ...— ?ln-2— k(n~l) \^ L ) 

<T\'2—kn 0"l-2—A»» 0"2-13—A» 0"n-12...A;(n-l) 

20. In all the preceding sections no assumption of any kind has been 
made with respect to the form of the distribution of frequency, but the 
results may, of course, be applied to the special case of the normal dis- 
tribution. Let y\2... n denote the value of the normal function for the 
combination of deviations X\, x 2 , . . ,x m and y\ 2 ... n the value of the function 
when all deviations are zero, then we may write 

yi*..n = y'i2~% • exp — J<£ (x Y x 2 . . .x n \ (22) 

the form of the function <£ being determined by the fact that the distribution 
of every array must be normal, and that the mean of the array of any one 
variable associated with given types of the others must be the linear function 
of those types given by the general regression equation of the form (1). We 
must have, accordingly 



2 2 • * 2 

O" 1-23— n & 2-13— n „ O* n-12-..(n-l) 

O a , X\X 2 f) X n — \X n /OQ\ 

— ^12-3— n ... — ^^(n-l)n-12.-.(n-2) • \^o) 

0"l-23— nCT2.13— n 0"(n-l)a..-'(n-2)0"na—(»-l) 

But this expression may be thrown into several different forms. Thus, 
replacing the correlated variables, x h x 2y ...x n , by the independent variables, 
%i, x 2 .i, x s .i 2 , ...a3».i».(n-i), we have the very useful form 

^ = ^+^+^ + ... + ^ 1 '" (w " 1) > (24) 

a i (T^-l O* 3-12 cr n.l-(n-l) 

This expression may be shown to be identical with (23) by expanding in 
terms of deviations of order zero, and reducing the coefficients of the square 
terms by means of the equation 

V*n-k + 1 = 1 

^T"~ *T*~ ST*~ 

« rt'k ° n v fi'k 

and those of the product terms by an equation derived at once from (19), 

n3.2^23.1* _ r 12 _ Ti 2 . S k 

<ri-3k<T2-3k Cri-kC2-k 0i-Zk cr 2-3k 

2 
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21. Several important results follow at once from the form of the expression 
(24) for the exponent <£>. Since the variables are independent, the central 
value of the normal function, y\%.. n > must be given by the product of the 
well-known expressions for the single variables, i.e., we must have 

y'i2«.» = N'/(27r) w/2 <7icr2.icr3.i2...cr n .i...( n -i). (25) 

22. Again, if we integrate the normal function in the form (24) with 
respect to x\, treating the remaining variables, #2.1, #3.21* etc., as constants of 
integration, <n is eliminated from y' and x\ from <£, and all the remaining 
variables in the exponent contain the secondary suffix 1. If x 2 .i, #3.21, 
...&».!...(»- 1) are then replaced by x 2 .i, x^i, ...x n .i, <f> may be written in the 
form (23) for these variables. Similarly, if we integrate again with respect to 
%2-i> <7"2-i is removed from y r and x 2 *i from <£. and all the remaining variables 
in the exponent contain both secondary suffixes 1 and 2. If ^ 3 . 2 i, ^4.321, 
. . .^n.321 are then replaced by # 3 . 2 i, ^4-21, • • >%n-2i, $ ma y be written in the form (23) 
for these variables. Clearly the process may be continued on the same lines. 
The correlation between all sets of deviations, of any one order, with the same 
secondary suffixes, is therefore normal correlation. 

23. It follows that we may generalise at once the known formulas for the 
probable errors of the constants of a normal distribution. Omitting the 
factor 0*674489... we have, standard error of a 

Standard deviation <t\ <ri/y/2~N. 

Correlation coefficient 7*12 1 — r 2 i 2 /\/N. 

Eegression coefficient 612 ci. 2 j(r 2 ^/l^. 

The first is a well-known result ; the last two are cited from the valuable 
memoir by Professor Karl Pearson and Mr. L. N. Gr. Filon.* But since ax. k is 
the standard deviation of the normally distributed variable x\.u t r\ 2 .k the 
correlation between the normally distributed variables x\. k and # 2 .*, and &12.& 
the regression of x\.k on #3.*:, Ewe must have, quite generally, h denoting as 
before either a single subscript or an aggregate, standard error of a 

Standard deviation a\.k aLk/^/^N. 

Correlation coefficient r X2 . k 1 —r 2 12 . k j ^/^. 

Eegression coefficient &12.& &i2-k/<r2>k\/N. (26) 

The last result may be readily verified against the formula arrived at by 
Professor Pearson and Mr. Filon, for the case of three variables, after pages of 
the most laborious work.f The first may be checked for the case of two 

* < Phil. Trans.,' A (1898), vol. 191, p. 229. 
t Log. cit., equation xxxviii, p. 260. 
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variables, remembering the result of the same writers,* that the correlation 
between errors in cri and in r i2 is ri 2 /\/2 ; for we have 

o-i.2= ci (1— r 2 i 2 ) h l 

da\.2 __ cl<T\ __ r\2 - dry 
(Tl.2 <ri 1— r 2 i 2 * 

Or, squaring both sides of the equation and summing, using ei. 2 to denote the 
standard error of cri. 2 , 

6 2 L2 _ _1_ , r 2 12 __ 2^12 ^12 1 l—T 2 12 _ _1_ 

<7 2 i. 2 28 1$ l-r 2 12v /2 y/28 v/N 2N* 

23. The question of errors of sampling in the case of the coefficient of w-fold 
correlation, E, is not so simple, owing to the fact that the sign of the 
coefficient is essentially positive and, consequently, it is subject to biassed 
error. If, for instance, a series of variables are strictly independent, but 
values are found for r 12) n^, ^14.32, etc., equal to S 2 , S3, S 4 ,...then 

1—E 2 i(23...n) = (1—S 2 2 )(1— S 2 3 )...(l — S 2 n ). 

If the 8'$ are sufficiently small to enable us to neglect terms of the fourth 

order as compared with those of the second order, then we may write to the 

first approximation, 

E 2 i <23.»n) = B 2 2 4- S 2 3 + . . . 4- S 2 n . 

Or, summing for a number of samplings and substituting 1/N for 2(S 2 ) in 

each case, the root-mean-square value of E when the variables are strictly 

independent is 

E = Oi~-l)Vm (27) 

n being the number of variables and N" the number of observations. E 
cannot be held with certainty to be of definite significance if not markedly 
greater than this, and if the number of observations be small compared with 
the number of variables, the critical value is rather unpleasantly large. Thus 
in the case of a recent investigation by Mr. E. H. Hooker into the relation 
between the weather and the crops, n = 3, If = 21, consequently 
E = vVt = 0*31 (the value cited by him on my authority ).t Clearly, if the 
number of observations be small, it is not worth while dealing with a large 
number of variables. 

* Log. cit., equation xviii, p. 242. 

t ' Boy. Stat. Soc. Journ.,' vol. 70 (1907), p. 7. 



